How are property taxes changing over time?

In this data visualization, we will be looking only at homes with the homeowner’s exemption that maintained that exemption from 2007-2016. This means homes that are sold from one primary resident to another. The “clean_data.rmd” notebook contains all the code for cleaning, subsetting, and preparing the data.

Loading libraries and the data:

So now the year the buildings have been most recently sold is AFTER the “earliest year,” which will be my proxy for “year built.”

Now I’m curious about the neighborhoods!

rr p = ggplot(data = dat_ho_2016_sub)+ geom_boxplot(aes(x = Analysis Neighborhood, y = Total Taxable Assessment))+ theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) p

rr p = ggplot(data = dat_ho_2016_sub)+ geom_boxplot(aes(x = Analysis Neighborhood, y = Total Taxable Assessment Percent Difference From 2012))+ theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) p

rr p = ggplot(data = dat_ho_2016_sub %>% filter(Current Sales Date > 2008-01-01 & Total Taxable Assessment Percent Difference From 2012 < 100 & Total Taxable Assessment Percent Difference From 2012 > 0))+ geom_boxplot(aes(x = Analysis Neighborhood, y = Total Taxable Assessment Percent Difference From 2012))+ theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) p

rr neighborhoods = unique(dat_ho_all_sub$Analysis Neighborhood) currdat = dat_ho_all_sub %>% filter(as.numeric(format(Current Sales Date,%Y)) >= 2006) pAssessTime = function(nhood){ currdat2 = currdat %>% filter(Analysis Neighborhood == nhood & as.numeric(format(Current Sales Date,%Y)) >= 2006) p = ggplot(data = currdat2) + geom_line(aes(x = Closed Roll Year, y = Total Taxable Assessment, group = Parcel Number, color = Earliest Year))+ #geom_point(aes(x = as.numeric(format(Current Sales Date,%Y)), # y = Total Taxable Assessment, # group = Parcel Number))+ scale_color_distiller(palette = )+ ggtitle(paste(nhood))+ theme_dark()#+ #facet_wrap(~as.factor(as.numeric(format(Current Sales Date,%Y)) >= 2006)) return(p) } lapply(neighborhoods, pAssessTime)

[[1]]

[[2]]

[[3]]

[[4]]

[[5]]

[[6]]

[[7]]

[[8]]

[[9]]

[[10]]

[[11]]

[[12]]

[[13]]

[[14]]

[[15]]

[[16]]

[[17]]

[[18]]

[[19]]

[[20]]

[[21]]

[[22]]

[[23]]

[[24]]

[[25]]

[[26]]

[[27]]

[[28]]

[[29]]

[[30]]

[[31]]

[[32]]

[[33]]

[[34]]

[[35]]

[[36]]

[[37]]

[[38]]

So this shows some interesting patterns. For homes that were not sold, there is often a dip in property values starting after 2008 and reaching a minimum around 2012. This is because homeowners can have their homes reassessed if the house loses value, and there was the 2008 financial crisis. (CITE) If the housing market recovers, the house is reassessed but only to what would be maximally allowed based on the original property assessment. In houses that were sold during this time period, there are often large jumps in the assessment value. This is what I expected. The number of houses that maintained a steady property growth rate surprised me. Maybe the depressed property values after 2008 contributed to these properties not gaining greatly in value. Maybe the new owners benefitted from transfering their old property assessment (CITE). Maybe the property was sold to a child or grandchild, who would be allowed to keep the lower assessment (CITE).

(WORD THIS BETTER) I’m going to fit each property with a linear regression to see what the average increase in value per year is between 2008 and 2016. Then I can see what neighborhoods or perhaps characteristics of houses correlate with increasing or stable taxable assessments.

rr slopes = dat_ho_all_sub %>% group_by(Parcel Number) %>% summarise(slope = lm(Total Taxable Assessment~Closed Roll Year)$coefficients[[2]]) dat_ho_all_sub_slopes = slopes %>% left_join(dat_ho_all_sub, by = Number)

rr ggplot(data = slopes %>% filter(slope < 400000 & slope > -100000))+ geom_histogram(aes(x = slope), binwidth = 1000)+ scale_y_log10()

rr ggplot(data = slopes %>% filter(slope < 400000 & slope > -100000))+ geom_histogram(aes(x = slope), binwidth = 1000)

rr neighborhoods = unique(dat_ho_all_sub_slopes$Analysis Neighborhood) pAssessTime = function(nhood){ currdat = dat_ho_all_sub_slopes %>% filter(Analysis Neighborhood == nhood) p = ggplot(data = currdat) + geom_jitter(aes(x = 1, y = slope, group = Parcel Number, color = Earliest Year))+ scale_color_distiller(palette = )+ ggtitle(paste(nhood))+ theme_dark()+ facet_wrap(~as.factor(Current Sales Date >= 2006)) return(p) } lapply(neighborhoods, pAssessTime)

[[1]]

[[2]]

[[3]]

[[4]]

[[5]]

[[6]]

[[7]]

[[8]]

[[9]]

[[10]]

[[11]]

[[12]]

[[13]]

[[14]]

[[15]]

[[16]]

[[17]]

[[18]]

[[19]]

[[20]]

[[21]]

[[22]]

[[23]]

[[24]]

[[25]]

[[26]]

[[27]]

[[28]]

[[29]]

[[30]]

[[31]]

[[32]]

[[33]]

[[34]]

[[35]]

[[36]]

[[37]]

[[38]]

rr options(digits = 15) get_lat = function(geo){ lat = strsplit(geo, [( ```

sfmap = map_data("county", region = "California") %>% filter(subregion == "san francisco")
p = ggplot()+
  geom_polygon(data = sfmap, 
               aes(x=long, y = lat))
p
currdat = dat_sub %>% filter(`Closed Roll Year` == 2016,
                                           `Slope Percent Difference` >= 0)
p = ggplot()+
  geom_point(data = currdat,
             aes(x = long,
                 y = lat,
                 color = `Slope Percent Difference`))+
  #scale_color_gradientn(trans = "log", colors = rainbow(9))+
  scale_color_distiller(palette = "Spectral", trans = "log")+
  coord_map()
p
p = ggplot()+
  geom_point(data = currdat,
             aes(x = long,
                 y = lat,
                 color = `Slope Percent Difference`))+
  scale_color_gradientn(trans = "log", colors = rev(rainbow(9)))+
  coord_map()
p

rr neighborhoods = unique(dat_ho_all_sub_slopes$Analysis Neighborhood) pLocation = function(nhood){ currdat = dat_ho_all_sub_slopes %>% filter(Analysis Neighborhood == nhood) p = ggplot(data = currdat) + geom_jitter(aes(x = 1, y = slope, group = Parcel Number, color = Earliest Year))+ scale_color_distiller(palette = )+ ggtitle(paste(nhood))+ theme_dark()+ facet_wrap(~as.factor(Current Sales Date >= 2006)) return(p) }

lapply(neighborhoods, pAssessTime)

rr dat_ho_2016_sub2 = dat_ho_2016_sub %>% filter(Total Taxable Assessment Percent Difference From 2012 > 1000 & Total Taxable Assessment <3000000)

p = ggplot(data = dat_ho_all_sub %>% filter(Parcel Number %in% dat_ho_2016_sub2$Parcel Number)) + geom_line(aes(x = Closed Roll Year, y = Total Taxable Assessment, group = Parcel Number, color = Year Property Built))+ scale_color_distiller(palette = )+ theme_dark() #theme(legend.position = ) p

pAssessEarliestDate = function(nhood){ currdat = dat_ho_all_sub %>% filter(Analysis Neighborhood == nhood) p = ggplot(data = currdat %>% filter(Parcel Number %in% dat_ho_2016_sub2$Parcel Number)) + geom_point(aes(x = Current Sales Date, y = Total Taxable Assessment, group = Parcel Number, color = Earliest Year))+ scale_x_date()+ scale_color_distiller(palette = )+ theme_dark()+ #theme(legend.position = ) return(p) } lapply(neighborhoods[1:3], pAssessEarliestDate)

p = ggplot(data = dat_ho_all_sub_slopes %>% filter(slope < 400000 & slope > 0)) + geom_point(aes(x = Earliest Year, y = Current Sales Date, color = slope))+ scale_color_distiller(palette = )+ scale_y_date() p

More questions to explore

1. Who is rent control benefiting?

  • What are the incomes of people who are in rent control?
  • What are the rents of rent control units?
  • Are people in rent control using other city programs?
  • Can I find out if some people who have rent control, own homes elsewhere?
  • Are rent controlled homes more “derelict?”
  • Do rent controlled units rent for a premium?
  • What about people without rent control. Are they more likely to have roomates?
  • Or be wealthier?
  • Or move more often?
  • Does rent control change or correlate with certain behaviors?

2. Of people who own homes:

  • How long have they been there?
  • What taxes are people paying?
  • What are the incomes of people who own?
  • Variables to explore: income, year purchased, taxes paid

3. Homes and apartments in general. What does turnover look like?

  • How often/were are homes being sold?
  • How often/were are homes sold to foreign buyers?
  • How often/where are homes put on AirBnB or VRBO as short term rentals?
  • Whats the historical trend of houses being condo converted?
---
title: "SF Assessor's Office Data Exploration"
output: html_notebook
---

###How are property taxes changing over time?
In this data visualization, we will be looking only at homes with the homeowner's exemption that maintained that exemption from 2007-2016. This means homes that are sold from one primary resident to another. The "clean_data.rmd" notebook contains all the code for cleaning, subsetting, and preparing the data. 

Loading libraries and the data:
```{r, cache=TRUE, echo=FALSE}
library(tidyverse)
library(ggplot2)
dat = read_rds("./data/compressed_assessors_data_subset.rds")
```



```{r}
currdat = dat %>% filter(`Closed Roll Year` == 2016)

p = ggplot(data = currdat) +
  geom_histogram(aes(x = `Total Taxable Assessment`),
                 binwidth = 1000)
p

p = ggplot(data = currdat) +
  geom_histogram(aes(x = `Total Taxable Assessment`),
                 binwidth = .01)+
  scale_x_continuous(trans = "log")
p

p = ggplot(data = currdat %>% filter(!is.na(`Earliest Year`))) +
  geom_histogram(aes(x = as.numeric(`Earliest Year`)),
                 binwidth = 1)
p

p = ggplot(data = currdat %>% filter(!is.na(`Current Sales Date`))) +
  geom_histogram(aes(x = as.numeric(format(`Current Sales Date`,"%Y"))),
                 binwidth = 1)
p
```
So now the year the buildings have been most recently sold is AFTER the "earliest year," which will be my proxy for "year built."

Now I'm curious about the neighborhoods!
```{r}
p = ggplot(data = dat_ho_2016_sub)+
  geom_boxplot(aes(x = `Analysis Neighborhood`,
               y = `Total Taxable Assessment`))+
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
p

p = ggplot(data = dat_ho_2016_sub)+
  geom_boxplot(aes(x = `Analysis Neighborhood`,
               y = `Total Taxable Assessment Percent Difference From 2012`))+
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
p

p = ggplot(data = dat_ho_2016_sub %>% filter(`Current Sales Date` > 2008-01-01 &
                                               `Total Taxable Assessment Percent Difference From 2012` < 100 &
                                               `Total Taxable Assessment Percent Difference From 2012` > 0))+
  geom_boxplot(aes(x = `Analysis Neighborhood`,
               y = `Total Taxable Assessment Percent Difference From 2012`))+
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
p
```

```{r}
neighborhoods = unique(dat_ho_all_sub$`Analysis Neighborhood`)
currdat = dat_ho_all_sub %>% filter(as.numeric(format(`Current Sales Date`,"%Y")) >= 2006)

pAssessTime = function(nhood){
  currdat2 = currdat %>% filter(`Analysis Neighborhood` == nhood & as.numeric(format(`Current Sales Date`,"%Y")) >= 2006)
  p = ggplot(data = currdat2) +
    geom_line(aes(x = `Closed Roll Year`,
                  y = `Total Taxable Assessment`,
                  group = `Parcel Number`,
                  color = `Earliest Year`))+
    #geom_point(aes(x = as.numeric(format(`Current Sales Date`,"%Y")),
    #              y = `Total Taxable Assessment`,
    #              group = `Parcel Number`))+
    scale_color_distiller(palette = "Spectral")+
    ggtitle(paste(nhood))+
    theme_dark()#+ 
    #facet_wrap(~as.factor(as.numeric(format(`Current Sales Date`,"%Y")) >= 2006))
  return(p)
}

lapply(neighborhoods, pAssessTime)
```
So this shows some interesting patterns. For homes that were not sold, there is often a dip in property values starting after 2008 and reaching a minimum around 2012. This is because homeowners can have their homes reassessed if the house loses value, and there was the 2008 financial crisis. (CITE) If the housing market recovers, the house is reassessed but only to what would be maximally allowed based on the original property assessment. In houses that were sold during this time period, there are often large jumps in the assessment value. This is what I expected. The number of houses that maintained a steady property growth rate surprised me. Maybe the depressed property values after 2008 contributed to these properties not gaining greatly in value. Maybe the new owners benefitted from transfering their old property assessment (CITE). Maybe the property was sold to a child or grandchild, who would be allowed to keep the lower assessment (CITE). 

(WORD THIS BETTER) I'm going to fit each property with a linear regression to see what the average increase in value per year is between 2008 and 2016. Then I can see what neighborhoods or perhaps characteristics of houses correlate with increasing or stable taxable assessments. 

```{r}
slopes = dat_ho_all_sub %>% 
  group_by(`Parcel Number`) %>%
  summarise(slope = lm(`Total Taxable Assessment`~`Closed Roll Year`)$coefficients[[2]])

dat_ho_all_sub_slopes = slopes %>%
  left_join(dat_ho_all_sub, by = "Parcel Number")

```

```{r}
ggplot(data = slopes %>% filter(`slope` < 400000 & `slope` > -100000))+
  geom_histogram(aes(x = `slope`),
                 binwidth = 1000)+
  scale_y_log10()
ggplot(data = slopes %>% filter(`slope` < 400000 & `slope` > -100000))+
  geom_histogram(aes(x = `slope`),
                 binwidth = 1000)
```


```{r}
neighborhoods = unique(dat_ho_all_sub_slopes$`Analysis Neighborhood`)

pAssessTime = function(nhood){
  currdat = dat_ho_all_sub_slopes %>% filter(`Analysis Neighborhood` == nhood)
  p = ggplot(data = currdat) +
    geom_jitter(aes(x = 1,
                  y = `slope`,
                  group = `Parcel Number`,
                  color = `Earliest Year`))+
    scale_color_distiller(palette = "Spectral")+
    ggtitle(paste(nhood))+
    theme_dark()+ 
    facet_wrap(~as.factor(`Current Sales Date` >= 2006))
  return(p)
}

lapply(neighborhoods, pAssessTime)
```

```{r}
options(digits = 15)
get_lat = function(geo){
  lat = strsplit(geo, "[(,)]")[[1]][2]
  lat = as.double(lat, length = 15)
  return(lat)
}
get_long = function(geo){
  long = strsplit(geo, "[(,)]")[[1]][3]
  long = as.double(long, length = 15)
  return(long)
}
#lats = lapply(dat_ho_all_sub_slopes$the_geom, get_lat)
#longs = lapply(dat_ho_all_sub_slopes$the_geom, get_long)
dat_ho_all_sub_slopes = dat_ho_all_sub_slopes %>% 
  rowwise() %>%
  mutate(lat = get_lat(the_geom),
         long = get_long(the_geom))
```

```{r}

sfmap = map_data("county", region = "California") %>% filter(subregion == "san francisco")
p = ggplot()+
  geom_polygon(data = sfmap, 
               aes(x=long, y = lat))
p

currdat = dat_sub %>% filter(`Closed Roll Year` == 2016,
                                           `Slope Percent Difference` >= 0)
p = ggplot()+
  geom_point(data = currdat,
             aes(x = long,
                 y = lat,
                 color = `Slope Percent Difference`))+
  #scale_color_gradientn(trans = "log", colors = rainbow(9))+
  scale_color_distiller(palette = "Spectral", trans = "log")+
  coord_map()
p

p = ggplot()+
  geom_point(data = currdat,
             aes(x = long,
                 y = lat,
                 color = `Slope Percent Difference`))+
  scale_color_gradientn(trans = "log", colors = rev(rainbow(9)))+
  coord_map()
p

```

```{r}
neighborhoods = unique(dat_ho_all_sub_slopes$`Analysis Neighborhood`)
pLocation = function(nhood){
  currdat = dat_ho_all_sub_slopes %>% filter(`Analysis Neighborhood` == nhood)
  p = ggplot(data = currdat) +
    geom_jitter(aes(x = 1,
                  y = `slope`,
                  group = `Parcel Number`,
                  color = `Earliest Year`))+
    scale_color_distiller(palette = "Spectral")+
    ggtitle(paste(nhood))+
    theme_dark()+ 
    facet_wrap(~as.factor(`Current Sales Date` >= 2006))
  return(p)
}

lapply(neighborhoods, pAssessTime)
```


```{r}
dat_ho_2016_sub2  = dat_ho_2016_sub %>% filter(`Total Taxable Assessment Percent Difference From 2012` > 1000 & `Total Taxable Assessment` <3000000)

p = ggplot(data = dat_ho_all_sub %>% filter(`Parcel Number` %in% dat_ho_2016_sub2$`Parcel Number`)) +
  geom_line(aes(x = `Closed Roll Year`,
                y = `Total Taxable Assessment`,
                group = `Parcel Number`,
                color = `Year Property Built`))+
  scale_color_distiller(palette = "Spectral")+
  theme_dark()
  #theme(legend.position = "none")
p

pAssessEarliestDate = function(nhood){
  currdat = dat_ho_all_sub %>% filter(`Analysis Neighborhood` == nhood)
  p = ggplot(data = currdat %>% filter(`Parcel Number` %in% dat_ho_2016_sub2$`Parcel Number`)) +
  geom_point(aes(x = `Current Sales Date`,
                y = `Total Taxable Assessment`,
                group = `Parcel Number`,
                color = `Earliest Year`))+
  scale_x_date()+
  scale_color_distiller(palette = "Spectral")+
  theme_dark()+
  #theme(legend.position = "none")
  return(p)
}
lapply(neighborhoods[1:3], pAssessEarliestDate)


p = ggplot(data = dat_ho_all_sub_slopes %>% filter(`slope` < 400000 & `slope` > 0)) +
  geom_point(aes(x = `Earliest Year`,
                y = `Current Sales Date`,
                color = `slope`))+
  scale_color_distiller(palette = "Spectral")+
  scale_y_date()
p
```


### More questions to explore
####1. Who is rent control benefiting? 
  - What are the incomes of people who are in rent control?
  - What are the rents of rent control units?
  - Are people in rent control using other city programs?
  - Can I find out if some people who have rent control, own homes elsewhere?
  - Are rent controlled homes more "derelict?"
  - Do rent controlled units rent for a premium?
  - What about people without rent control. Are they more likely to have roomates?
  - Or be wealthier?
  - Or move more often?
  - Does rent control change or correlate with certain behaviors?

####2. Of people who own homes:
  - How long have they been there?
  - What taxes are people paying?
  - What are the incomes of people who own?
  - Variables to explore: income, year purchased, taxes paid
  
####3. Homes and apartments in general. What does turnover look like?
  - How often/were are homes being sold? 
  - How often/were are homes sold to foreign buyers?
  - How often/where are homes put on AirBnB or VRBO as short term rentals?
  - Whats the historical trend of houses being condo converted?
